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ABSTRACT 


This thesis attempts to authentieate a smartphone user by pattern of life based on a 
smartphone user’s geolocation throughout the course of a day. Current smartphone 
technology uses the global positioning system (GPS) as the primary source for 
geolocation because of its accuracy. However, services such as Google Location Service 
and Skyhook use Receive Signal Strength Indicator (RSSI)-based geolocation in GPS- 
degraded environments, such as inside a building. By using a smartphone’s Wi-Fi 
application programming interface, a smartphone would detect all wireless access points’ 
Wi-Fi signals and associated signal strength over a discrete time interval. A hidden 
Markov model is used to model various smartphone users and used as an authentication 
method. The resulting f-score from the experiments ranged between 0.76 and 0.80, which 
is well above the 0.20 baseline. It is feasible to use RSSI-based geolocation as an element 
in combination with other methods to continuously authenticate a smartphone user. For 
an acceptable authentication method, the evaluation criteria must be as close to 1.0 as 
possible. Future research could combine authentication from RSSI-based geolocation 
with gait and keystroke analysis to improve results by leveraging other sensors on a 
smartphone. 
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I. INTRODUCTION 


Current smartphone technology uses the global positioning system (GPS) as a 
primary source for geolocation. GPS provides localization accuracy within 7.8 meters 
with continuous availability with 24 satellites. GPS requires line of sight acquisition of at 
least 3 satellites in order to calculate a receiver’s current location. However, the signals 
from the GPS satellites could be impeded by inclement weather or obstruction such as 
buildings or mountains depending on the receiver’s antenna gain [1]. The type of GPS 
receivers in smartphones varies by vendors, which results in satellite acquisition times to 
vary from seconds to minutes [2]. 

Services like Skyhook and Google Location Services, which use a form of 
received signal strength indicator (RSSI)-based geolocation, have gained in popularity 
due to their accuracy, availability, and speed for indoor geolocation without GPS 
coverage [3]. RSSI-based geolocation measures signal strengths of wireless access points 
from various locations to build a database. The location of the smartphone is calculated 
by first measuring the various signal strengths from surrounding wireless access points 
then comparing to the entry of the database. RSSI-based geolocation accuracy depends 
on the number of wireless access points in the database and has been shown to have 
accuracy within 74 meters [4]. However, Skyhook has over 50 million wireless access 
points in its database and reports accuracy within 10 to 20 meters [3]. 

Recent research has used GPS because of its availability and accuracy to link the 
user’s’ geolocation with their daily activities. Examples of activities are walking from 
the parking lot to the office or being at work. In this study, RSSI data will be used 
because of its ability to provide geolocation indoors. The RSSI data of a user’s daily 
activity from a smartphone will be used to build a profile of the user. A hidden Markov 
model (HMM) will be used to classify users and ensure he or she is an authorized user of 
the smartphone. 
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A. MOTIVATION 

The high-level motivation for this researeh was to perform preliminary 
experiments on methods to eontinuously authenticate a very important person (VIP) such 
as a high-ranking diplomat. Basically, during the course of a VIP’s normal hour, day, or 
week the algorithm analyzes the RSSI from wireless AP and, based on the pattern, 
verifies the identity of the VIP. Conversely, if a VIP’s smartphone was lost or stolen, the 
algorithm would detect a pattern, which is not normal and would identify the user as 
someone other than the VIP. 

B. RESEARCH QUESTION 

This thesis attempts to answer the following questions: 

• Is it possible to authenticate a smartphone user by continuous RSSI-based 
geolocation? 

• Can we use a HMM to model a user’s geolocation throughout the day? 
If yes, can we distinguish between various individuals? 

C. SIGNIFICANT FINDINGS 

The result of this thesis shows the feasibility of continuously authenticating a 
smartphone user by modeling user-behavior based on RSSI evidence. The precision, 
recall, and f-score for all the experimental runs were greater than 0.7 using a HMM. 
Because the machine-learning algorithm must account for temporal movements from one 
location to another, classifiers that ignore the time domain, like clustering and Bayesian 
networks, will not work. Since we used a small data set and restricted our test parameters, 
future work is warranted. 
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D, THESIS STRUCTURE 

This thesis is organized as follows: 

• Chapter I eover the motivation, researeh questions, and signifieant 
findings of the research to be conducted. 

• Chapter II discusses prior work as it pertains to this research. 

• Chapter III describes the experimental design for this research. 

• Chapter IV contains the results and analysis of the experiment 

• Chapter V contains the summary of the research and recommended future 
work. 
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II. PRIOR AND RELATED WORK 


In this chapter, we first discuss prior research in the field of geolocation data from 
a smartphone. Next, we deseribe the different sources of geolocation. Finally, we discuss 
machine learning and the evaluation eriteria for machine learning. 

A, RELATED RESEARCH 

Ashbrook and Starner [5] eondueted two studies attempting to predict movements 
of people. The studies used GPS-based geolocations to model human behavior. GPS 
geolocation data was collected over a 4-month period in Atlanta, Georgia. Because GPS 
has an aceuraey of approximately 15 meters, a person could be in the exact some spot yet 
log different loeations. Ashbrook and Starner used k-means cluster algorithm to 
normalize the GPS error by associating all latitudes and longitudes within a half-mile 
radius as a single discrete location. A Markov model was then derived from the time 
sequenced loeations. The Markov model was able to predict the probability where a 
person is headed based on their eurrent location [5]. 

Liao et al. [6] used hierarchical conditional random fields (CRF) for GPS-based 
activity recognition. The study collected GPS geolocation on four users for a one-week 
period. The GPS loeations were elustered using 10-meter segments then correlated to 
street locations. The bottom layer of the hierarchieal CRF contained nodes from the GPS 
traee. The middle layer contained nodes of inferred activities sueh as walking, driving, or 
getting on the bus, while the top layer eontained significant places sueh as home, work, or 
shopping. Liao et al. used the data from three users to train the data while using the fourth 
as the test. The study achieved above 90% accuracy for navigation activities and 
85% aecuracy for significant plaees [6]. 

De Montjoye et al. [7] used anonymous eellphone data for one-and-a-half million 
users over a 15-month period in Western Europe to find unique traces in human mobility. 
Eaeh time a user made or reeeived a call or text message, the serviee provider logged the 
time and all cellphone towers within range. Using the logs, spatial and temporal 
eorrelated information eould be derived. Eigure 1 depicts a sequence of calls made by a 
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user and the area where cellphone towers were in range of the user. The study did not use 
machine-learning classifiers to find the traces for users. Instead, the study used set theory 
to extract unique traces from a set of spatial-temporal points in the mobility dataset. A 
unique trace is a vector of spatial-temporal points, which only appears once in the dataset. 
The study showed four unique spatial-temporal traces is enough to uniquely identify 
95% of users [7]. 
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Figure 1. (A) Times and locations of calls made or received and nearest 

antenna. (B) Approximation of antennas reception areas. (C) Lower 
resolution through spatial and temporal aggregation (from [7]). 


Alvarez-Alvarez et al. [8] correlated Wi-Fi position and body posture to human 
activity. In their experiment, they used Wi-Fi position information from four access 
points in a 440-square meter test environment syncing sampling rate with an 
accelerometer. A fuzzy rule-based classifier used the Wi-Fi geolocation to label locations 
such as an office, break room, or passageway. A fuzzy finite state machine used the 
accelerometer data to give relative posture of the person such as seated, standing, or 
walking. A second fuzzy finite state machine fused the relative location with relative 
posture to give human activity. Examples of human activities inferred in the experiment 
were sitting at desk, walking to the break room, or having a meeting in a co-workers 
office [8]. 

B. AUTHENTICATION 

Authentication is a systematic method of verifying a set of credentials to validate 

an authorized user. In computer security, three general factors are used for authentication; 

authentication by knowledge, authentication by ownership, and authentication by 

6 

























biometrics. Authentication by knowledge is something a person knows, such as a 
password, personal identification number (PIN), or a combination lock. Examples of 
authentication of ownership are keys, access cards, or badges, which an individual would 
possess. Authentications by biometrics target physical attributes like fingerprints, iris 
scan, or palm reader [9]. 

This thesis examines the possibility of authentication by behavior using the 
sensors in modern smartphones. Examples of this type of authentication are gait analysis, 
keystroke analysis, and pattern of life. Gait analysis is studying the uniqueness of a 
person’s motion. Keystroke analysis studies the time interval between various keys while 
a person types. This research will focus on pattern of life, which is a person’s movement 
from various locations throughout a normal day. 

C. GEOLOCATION 

Geolocation is the process of locating the geographic location of an object, such 
as a smartphone or handheld GPS receiver, using electronic means. Geolocation uses 
positioning system such as GPS or RSSI [2]. 

1. GPS 

GPS provides localization accuracy within 7.8 meters with continuous availability 
with 24 satellites. GPS requires line of sight acquisition of at least 3 satellites in order to 
calculate a receiver’s current location. However, the signals from the GPS satellites could 
be impeded by inclement weather or obstruction, such as buildings or mountains, 
depending on the receiver’s antenna gain [1]. The type of GPS receiver in smartphones 
varies by vendor, which results in a range of satellite acquisition times varying from 
seconds to minutes. 

2. Geometric Triangulation of Cell Towers 

Cell towers are another source of geolocation when GPS is not available. When a 
mobile phone user makes or receives a call, the mobile phone logs the time and cellular 
identification of cell towers in range. The estimated distance is calculated from the ping 
time between cell tower and mobile phone. Using estimated distance from multiple cell 
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towers, a geometric triangulation calculates the approximate geolocation within two 
kilometers [10]. 

3. RSSI-based 

RSSI-based geolocation is often used in an indoor environment when both GPS 
and cell tower signals are blocked. RSSI-based geolocation measures signal strengths of 
wireless access points from various locations to build a database. Unlike cell tower 
triangulation where the distance from cell tower to mobile phone is computed, the 
distance from the Wi-Fi AP is not calculated from the RSSI. The RSSI is dependent on 
several factors to include antenna gain, atmospheric, output power, and interference. The 
location of the smartphone is calculated by first measuring the signal strengths 
from surrounding Wi-Fi AP then comparing the values to a known database. RSSI-based 
geolocation accuracy depends on the number of wireless access points in the database, 
and has been shown to have accuracy within 74 meters [4]. However, Skyhook has 
over 50 million wireless access points in its database and reports accuracy within 
10 to 20 meters [3]. 

D, ANDROID WI-FI MANAGER APPLICATION PROGRAMMING 

INTERFACE (API) 

The Android Wi-Fi Manager API [II] manages all aspects of Wi-Fi connectivity 
within an Android device. A smartphone user uses the Wi-Fi manager to scan for 
available Wi-Fi networks and the signal strength associated with each network. Once a 
user selects a Wi-Fi network to connect, the Wi-Fi manager initiates the require 
authentication handshake. The following information is received from all Wi-Fi access 
points within range of the mobile device: 

• AP media access control (MAC) address 

• Service set identifier (SSID) 

• Frequency 

• Channel 

• RSSI 

• Timestamp 
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Several apps are available for both Android and lOS deviees. Sereen shots from 
Wi-Fi analyzer developed by Farproe [12] are shown in Figure 2. Wi-Fi analyzer is a free 
app download from the Google store. The sereen shots were taken from Glasgow East 
basement, Glasgow East third floor passageway, and the Del Monte Cafe, all located on 
the NFS campus. The screen shots show each location has a distinct fingerprint of Wi-Fi 
AP in relation to the Wi-Fi AP’s detected and their associated RSSI even of those in the 
same building. This distinction is used in this thesis to model a smartphone user’s pattern 
of life. 



GE Basement 



GE 3'^'^ Floor 


Del Monte Cafe 



Figure 2. 


Screen shots using Wi-Fi Analyzer app 


E. HIDDEN MARKOV MODEL 

Machine learning is the process of making predictions about an unknown data set 
based on properties learned from a known data set used to train the system. Machine 
learning is sometimes incorrectly confused with data mining, which is the process of 
discovering unknown properties in a data set. The premise of machine learning is to take 
a data set with known labels and build a model. The model is then used to generalize and 
classify unseen data. A modem example of machine learning is the email spam itself, not 
spam problem. A model is built on key words and word pairs labeled by a human as 
either spam or not. Using the model, the classifier will label new emails as either spam 
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or not [13]. In this thesis, a HMM, a machine learning algorithm, is used to model 
individual smartphone users then attempts to label those users based on unseen patterns. 

A HMM [14] is a machine learning model used when the data set is dependent on 
the sequence of collection. A HMM is a probabilistic finite state automaton where the 
output is dependent on the state. For this thesis, a HMM is used because the machine 
learning algorithm must be able to classify smartphone user based on transitions to 
various locations on the NFS campus. Classifiers such as Naive Bayes would not work 
because they account for the similar Wi-Fi AP’s the students detect but not the changes 
throughout the day. 

1. Definition of a HMM 

The mathematical definition of a HMM is a quintuple as follows: 

S is the state alphabet, where N is the number of states: 

V is the vocabulary alphabet for the set of symbols that may be emitted: 

y = {vi,...,v^}. 

Q is the fixed state sequence of length T: 

Q ~ Qii — iQt • 

O is the corresponding observations to the fixed state sequence; 

(D Oj ,. . . ,Oy. . 

A is the transition probability matrix, where aij is the probability of transitioning 
from state i to state j: 

A = [a..],a,^ = P{q, = s. I = s.). 

B is the emission probability matrix, where bij is the probability of emitting 
symbol i in state j: 

B = [h,.(k)],h;(k) = P{o, = v^\q,^ Si). 

H is the initial probability distribution giving the probability of starting in each 

state: 

n = [n,.],F[. = = 5.). 
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The Markov assumption states the current state is dependent only on the previous 

state: 

The output-independence assumption states the observation at time t is dependent 
only on the current state: 

P{o,\o[^\q\) = P{o,\q,). 

2 . Three Fundamental Problems for HMM 

There are three fundamental problems for HMM design: evaluation, decoding, 
and learning. Chapter 15 of Russell and Norvig [13] describes the mathematical process 
to solve the fundamental problems. Once the fundamental problems are solved, the HMM 
could be applied to numerous statistical problems. Evaluation, decoding, and learning are 
defined as follows: 

• Evaluation: Given an observation sequence and HMM model, determine 
the probability of the observation sequence. 

• Decoding: Given an observation sequence and HMM model, determine 
the optimal sequence of model states. 

• Learning: Adjust the model parameters to best account for the observed 
signals to maximize the HMM? 

F. EVALUATION CRITERIA 

Machine learning algorithm uses the number of true positive (TP), false positive 
(EP), true negative (TN), and false negative (EN) as measurements of performance. Their 
definitions are as follows: 

• TP: correctly identified 

• EP: incorrectly identified 

• TN: correctly rejected 

• EN: incorrectly rejected. 
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1 . 


Confusion Matrix 


A confusion matrix is often used as a visualization tool showing the performanee 
of a elassifier. An example of a eonfusion matrix is shown in Table 1. In the example, 
there are 8 red, 6 blue, and 13 green. For class red, the eonfusion matrix yields the 
following results: 

• 5 TP: aetual red classified as red 

• 1 FP: blues ineorreetly elassified as red 

• 3 FN: red ineorreetly elassified as blue (2) and Green (1) 

• 17 TN: remaining eolors elassified eorreetly as non-red. 


Truth 

Inferred 1 

abel 

Red 

Blue 

Green 

Red 

5 

2 

1 

Blue 

1 

2 

3 

Green 

0 

4 

9 


Table 1. Example of eonfusion matrix 


2. Precision 

Preeision is also known as the positive predietive value. Precision is the fraetion 
of a classified elass that is relevant. In our example of the eonfusion matrix, red would 
have a preeision of 5/6, whieh is the number of red eorreetly identified divided by the 
total number inferred as red (total of the eolumn). The formula for preeision is as follows: 


TP 

precision - - . 

TP + FP 


3, Recall 

Recall measures the sensitivity of the algorithm. Recall is the fraction of the class 
correctly labeled from the actual class. In our example of the confusion matrix, red would 
have a recall of 5/8, which is the number of red correctly identified divided the actual 
number of the class (total of the row). The formula for recall is as follows: 
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recall - - 

TP + FN 


4 . F-score 

F-score is the harmonic mean of precision and recall. F-score takes into account 
precision and recall measuring the algorithm’s overall accuracy. In our example of the 
confusion matrix, red would have an f-score of 0.7. The formula for f-score is as follows 
[15]; 


F — Score = -jj— . 

- 1 - 

precision recall 
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III. EXPERIMENTAL DESIGN 


This chapter documents the methodologies and technical approaches in 
developing the experimental design used in this thesis. The methodology includes the 
research subjects and test parameters. The technical approach covers the tools used for 
data collection and transforming the data to a data structure to be used in a HMM. 

A. HARDWARE AND SOFTWARE 

The following hardware and software were used in this project: 

• Google Nexus 4 Smartphone with Android version 4.3 and 16 GB 
Memory 

• Power Mac Dual 3 GHz Intel Xeon processor, 16 GB 667 MHz RAM 
Memory 

• Python 2.7.5 

• Funf Journal for Android 

• Wi-Fi Analyzer for Android. 

B, RESEARCH SUBJECTS 

This thesis research used graduate students from NPS located in Monterey, 

California, to collect RSSI data. NPS courses use the quarter system, where each student 

is required to take a minimum course load of four classes each quarter. The course 

lectures are one hour each given Monday through Thursday with Fridays reserved for 

labs. Each student was assigned a randomly generated PIN to be used throughout this 

research in order to maintain personally identifiable information confidentiality. The 

students each carried a Google Nexus 4 smartphone Monday through Thursday. When 

the students arrived on campus at the beginning of the day, they would turn on the 

sensors for collection. If the students left campus for lunch or any other reason, they 

would turn off the sensors until their return to campus. At the end of the day, the student 

would turn off the sensors and lock the smartphone in a secure locker provided. In 

addition, the students maintained a log of times and locations on campus. The log was 

used to filter the data set for times when the student was off campus but forgot to turn off 

the sensors. Table 2 lists the pin, major and the number of data points collected for each 
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of the nine students. The number of data points eollected for eaeh student varied 
aeeording to the student’s schedule. Some students stayed on campus to study while 
others were only on campus for lectures. 


PIN 

Major 

# Data Points 

175 

Computer Science 

14,784 

122 

Computer Science 

6,021 

154 

National Security Affairs 

17,679 

112 

Business 

15,111 

198 

Information Assurance 

3,906 

141 

Business 

16,337 

128 

National Security Affairs 

13,611 

111 

Information Systems 

6,499 

372 

Computer Science 

14,589 


Table 2. Research subject’s PIN, major, and number of data points 

C. LOCATION OF EXPERIMENT 

The data for this thesis was collected on the NPS campus located in Monterey, 
California. The NPS campus is approximately 640 acres or 2.5 square kilometers. Figure 
3 is a map of the NPS campus. Approximately one-fourth of the campus houses the 
academic buildings, while the rest are tenant facilities for Naval Support Activity, 
Monterey. The yellow buildings on the map are the location of the academic buildings 
where a majority of the data was collected. 
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Figure 3. Map of Naval Postgraduate Sehool, Monterey, California (from [16]). 

D. DATA COLLECTION PARAMETERS 

Funf Journal [17] was used to colleet the data for this research. Funf Journal is an 
open source framework that allows researchers to use Android sensors to collect and 
store data related to environmental and movement data. The app was downloaded from 
the Google store. Funf contain 38 probes enabling researchers collect data such as Wi-Fi, 
location, and accelerometer. Figure 4 shows screen shots of the Funf Journal positioning 
probes. For this research, the probes for nearby cellular towers, simple location, and 
nearby Wi-Fi devices were set to collect data every minute. The data is encrypted then 
stored in a structured query language (SQL) database on the Nexus 4. The export button 
allows the researcher to e-mail the encrypted files. Once on a desktop computer, the files 
are decrypted in a database format (.db) then converted to a comma separated value 
(CSV) file [17]. 
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Figure 4. Screen shot of Funf Journal 


E. SPARSE MATRIX 

Once the data is extracted, the fields of the CSV file are parsed and filtered. The 
parsed file contains a list of tuples containing the timestamp, RSSI, and MAC address of 
all the Wi-Fi AP. A python script is used to input the list of tuples to form a sparse 
vector. A sparse vector and sparse matrix contains mostly zeroes [18]. The reason for 
transforming the tuples into sparse vector is to allow the data set to be inputted into a 
HMM. The script initially builds a vector of all zeros based on the MAC address. Each 
time an unseen MAC address is detected, a new element is created with a zero entry 
positioned at the sequential value based on the other MAC addresses already in the 
vector. Once the sparse vector is created, the script populates the sparse matrix. Each cell 
of the sparse matrix is RSSI values correlating to the MAC address. Within a minute 
sampling time, if the MAC address were detected, the RSSI value would replace the zero. 
The numbers of Wi-Ei AP scanned every minute varied from 1 to 20. A binary 
representation of the sparse matrix of test subject PIN-372 for a Wednesday from 0800 to 
1700 is shown in Figure 5. The horizontal axis is the MAC addresses while the vertical 
axis is time interval in minutes. The sparse matrix shows the pattern as the user moves 

from different classrooms throughout the day. 
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Figure 5. Sparse matrix 


F. SPLITTING MATRIX INTO TRAINING AND TEST DATASET 

The sparse matrix from eaeh student was divided into window size of 25 minutes. 
The results were several sub matrix with 25 rows for minutes and 1154 eolumns for the 
number of total MAC address deteeted from all the users. Floyd’s algorithm [19] for 
seleeting random eombinations of variables was used to divide the sub matrix into 
training and test. For the initial experiment, the algorithm randomly seleeted 80% of the 
dataset with uniform probability without replaeement. The remaining 20% was used for 
testing. Ten runs were eondueted on eaeh experiment eaeh randomly generating new 
training and test sets to provide ten-fold eross-validation. 

G. CLASSIFIER 

Onee the dataset was randomly divided into training and test subsets, Gaussian 
HMM from seikit-leam [20] for python 2.7.5 was used to elassify eaeh user. The results 
were displayed in a eonfusion matrix to ealculate the preeision, recall, and f-score. 
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IV. RESULTS AND ANALYSIS 


In this chapter, we review the results of our experiment. We first start with initial 
parameters of window size 25, 80% training and 20% testing, and sparse matrixes 
deteeted RSSI values. We varied our variable in eaeh sequential experiment. For eaeh 
experiment, ten runs were eondueted resampling eaeh time to provide ten-fold eross- 
validation. We only show the eonfusion matrix for the first run for the initial parameters 
in this ehapter, the remaining eonfusion matrix results are in the appendix. 

A. INITIAL PARAMETERS 

The eonfusion matrix for our initial experiment is shown in Table 3. See Table 4 
for the preeision, reeall, and f-seore for eaeh of our ten runs and the averages. For our 
initial parameters, we used a window size of 25, 80% training and 20% testing. 


Truth 

Inferred Lai 

bels 

111 

112 

122 

128 

141 

154 

175 

198 

372 

111 

22 

2 

0 

0 

0 

1 

0 

0 

7 

112 

1 

43 

6 

0 

3 

7 

4 

0 

1 

122 

0 

0 

17 

0 

0 

0 

6 

0 

0 

128 

0 

25 

0 

45 

13 

0 

0 

0 

0 

141 

0 

10 

0 

0 

60 

0 

0 

2 

0 

154 

0 

6 

1 

0 

2 

91 

0 

0 

0 

175 

0 

0 

3 

0 

0 

4 

35 

0 

0 

198 

0 

1 

0 

0 

0 

0 

0 

9 

0 

372 

0 

4 

0 

0 

4 

0 

0 

1 

60 


Table 3. Confusion Matrix for initial parameters run 1 
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Precision 

Recall 

F-score 

Run 1 

0.81 

0.77 

0.77 

Run 2 

0.77 

0.75 

0.75 

Run 3 

0.84 

0.82 

0.82 

Run 4 

0.82 

0.79 

0.80 

Run 5 

0.83 

0.81 

0.81 

Run 6 

0.83 

0.78 

0.78 

Run 7 

0.84 

0.83 

0.83 

Run 8 

0.79 

0.75 

0.76 

Run 9 

0.81 

0.79 

0.79 

Run 10 

0.80 

0.77 

0.78 

Avg 

0.81 

0.79 

0.79 


Table 4. Precision, Recall, and F-score from initial parameters 


B, BINARY 

For the binary experiment, instead of populating the sparse matrix with the RSSI 
value corresponding to the MAC address, a 1 was used if a Wi-Fi AP was detected 
otherwise the default value of zero was used. The resulting sparse matrixes only contain 
O’s and I’s. The purpose of this experiment is to determine if we can authenticate a user 
only by the Wi-Fi AP detected and not take into account the RSSI value. 



Precision 

Recall 

F-score 

Run 1 

0.81 

0.77 

0.77 

Run 2 

0.77 

0.75 

0.75 

Run 3 

0.84 

0.82 

0.82 

Run 4 

0.82 

0.79 

0.80 

Run 5 

0.83 

0.81 

0.81 

Run 6 

0.83 

0.78 

0.78 

Run 7 

0.84 

0.83 

0.83 

Run 8 

0.79 

0.75 

0.76 

Run 9 

0.81 

0.79 

0.79 

Run 10 

0.80 

0.77 

0.78 

Avg 

0.81 

0.79 

0.79 


Table 5. Precision, Recall, and F-score for binary 
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c. 


LOGARITHMIC VALUE OF RSSI 


In this experiment, we attempt to normalize the dataset by taking the logarithm 
value of the RSSI. The purpose for doing this is to account for the fluctuation in RSSI 
due to interference. The fluctuations could cause changes in the RSSI value by +/- 3 
decibels. Taking the logarithm of the RSSI value clusters near values together. For 
example, log base 3 of RSSI values -26, -27, and -28 are 3.0 while RSSI values -29, -30, 
and -31 are 3.1. Table 6, Table 7, and Table 8 are the precision, recall, and f-score for log 
3, log 5, and log 7, respectively. 



Precision 

Recall 

F-Score 

Run 1 

0.81 

0.78 

0.78 

Run 2 

0.79 

0.79 

0.79 

Run 3 

0.77 

0.73 

0.74 

Run 4 

0.84 

0.82 

0.83 

Run 5 

0.78 

0.75 

0.74 

Run 6 

0.79 

0.77 

0.77 

Run 7 

0.75 

0.68 

0.69 

Runs 

0.80 

0.75 

0.76 

Run 9 

0.81 

0.78 

0.79 

Run 10 

0.82 

0.81 

0.80 

Ayg_ 

0.80 

0.77 

0.77 


Table 6. Precision, Recall, and F-score for Log 3 



Precision 

Recall 

F-Score 

Run 1 

0.85 

0.84 

0.84 

Run 2 

0.83 

0.80 

0.81 

Run 3 

0.79 

0.74 

0.74 

Run 4 

0.82 

0.79 

0.79 

Run 5 

0.78 

0.77 

0.77 

Run 6 

0.82 

0.74 

0.75 

Run 7 

0.80 

0.74 

0.75 

Runs 

0.84 

0.80 

0.80 

Run 9 

0.79 

0.75 

0.76 

Run 10 

0.80 

0.79 

0.79 

Ayg_ 

0.81 

0.78 

0.78 


Table 7. Precision, Recall, and F-score for Log 5 
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Precision 

Recall 

F-Score 

Run 1 

0.85 

0.84 

0.84 

Run 2 

0.79 

0.74 

0.74 

Run 3 

0.85 

0.83 

0.83 

Run 4 

0.84 

0.80 

0.81 

Run 5 

0.77 

0.75 

0.75 

Run 6 

0.84 

0.83 

0.83 

Run? 

0.81 

0.77 

0.78 

Runs 

0.81 

0.80 

0.80 

Run 9 

0.84 

0.82 

0.82 

Run 10 

0.84 

0.82 

0.82 

Avg_ 

0.82 

0.80 

0.80 


Table 8. Precision, Recall, and F-score for Log 7 


D. VARYING THE WINDOW SIZE 

In this experiment, we vary the window size of the sparse matrix. Because NPS 
classes are 50 minutes long and start on the hour, varying the window size could better 
capture transition times when the students are moving from one class to another. Three 
different window sizes were used in this experiment. Table 9, Table 10, and Table 11 
represent the precision, recall, and f-score for window size 10, 15, and 20, respectively. 



Precision 

Recall 

F-Score 

Run 1 

0.79 

0.77 

0.77 

Run 2 

0.76 

0.75 

0.74 

Run 3 

0.77 

0.74 

0.75 

Run 4 

0.80 

0.74 

0.74 

Run 5 

0.80 

0.77 

0.77 

Run 6 

0.76 

0.74 

0.74 

Run? 

0.82 

0.79 

0.80 

Runs 

0.81 

0.79 

0.79 

Run 9 

0.85 

0.81 

0.81 

Run 10 

0.78 

0.76 

0.76 

Avg_ 

0.79 

0.77 

0.77 


Table 9. Precision, Recall, and F-score for Window Size 10 
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Precision 

Recall 

F-Score 

Run 1 

0.81 

0.75 

0.76 

Run 2 

0.83 

0.81 

0.81 

Run 3 

0.77 

0.75 

0.76 

Run 4 

0.80 

0.77 

0.78 

Run 5 

0.84 

0.83 

0.83 

Run 6 

0.81 

0.76 

0.77 

Run? 

0.79 

0.75 

0.76 

Runs 

0.82 

0.80 

0.80 

Run 9 

0.80 

0.74 

0.76 

Run 10 

0.82 

0.81 

0.81 

Avg_ 

0.81 

0.78 

0.78 


Table 10. Precision, Recall, and F-score for Window Size 15 



Precision 

Recall 

F-Score 

Run 1 

0.80 

0.79 

0.79 

Run 2 

0.78 

0.76 

0.75 

Run 3 

0.71 

0.68 

0.68 

Run 4 

0.80 

0.77 

0.77 

Run 5 

0.80 

0.78 

0.78 

Run 6 

0.78 

0.78 

0.77 

Run? 

0.81 

0.80 

0.80 

Runs 

0.85 

0.83 

0.83 

Run 9 

0.80 

0.74 

0.75 

Run 10 

0.81 

0.80 

0.80 

Avg 

0.79 

0.77 

0.77 


Table 11. Precision, Recall, and F-score for Window Size 20 


E. CHANGING PROPORTION OF TRAINING VERSUS TESTING DATA 

In this experiment, we changed the proportion of training versus testing data. For 
machine learning algorithms, the rule of thumb is to use 80% of the data for training and 
building the model and reserving the remaining 20% to test against the completed model. 
Presented in Table 12 are the precision, recall, and f-score when only 50% of the data 
was used for training and the remaining 50% used for testing. 
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Precision 

Recall 

F-Score 

Run 1 

0.79 

0.73 

0.72 

Run 2 

0.78 

0.77 

0.77 

Run 3 

0.82 

0.70 

0.71 

Run 4 

0.79 

0.72 

0.72 

Run 5 

0.75 

0.72 

0.72 

Run 6 

0.82 

0.74 

0.76 

Run? 

0.81 

0.77 

0.77 

Runs 

0.85 

0.83 

0.84 

Run 9 

0.81 

0.79 

0.79 

Run 10 

0.83 

0.82 

0.82 

Avg 

0.80 

0.76 

0.76 


Table 12. Precision, Recall, and F-score for 50% Training, 50% Test 


F. SUMMARY OF EXPERIMENTS 

Figure 6 is a summary of the average f-scores from all the experiments. As 
expected, the worst performance was using only 50% of the dataset for training. Of note, 
building a binary model of Wi-Fi AP detected resulted in similar results from using the 
RSSI values. All variations of the experiment revealed f-scores between 0.7 and 
0.8 showing a definite signal and reasonable probability of identifying a user based on 
RSSI-based geolocation. 


F-Score 




































































V. CONCLUSION AND FUTURE WORK 


A, SUMMARY 

The purpose of this thesis was to evaluate the feasibility of eontinuously 
authentieating a smartphone user using RSSI-based geoloeation. Previous researches 
have used either GPS or cell tower geometric triangulation as geoloeation sources. Our 
study collected RSSI data from nine NPS students each over a four-day period. The data 
collection was restricted to the NPS campus and filtered for times when the students were 
on campus. The RSSI and associated Wi-Fi AP data pair were put into a sparse matrix. 
The data was divided into 80% training and 20% testing. A HMM classifier was then 
used to model each user. The results of the experiments yield a precision, recall, and f- 
score between .70 and .85 for each of the test. The data shows RSSI-based geoloeation 
could be used to continuously authenticate a smartphone user, however, results must be 
closer to 1.0 in order to yield the high confidence level for an authentication system. 

B, FUTURE WORK 

This thesis sets the foundation for future work in continuous authentication of a 
smartphone user. The following are recommendations for future work: 

• Increase the number of research subjects. Only nine students were used 
during this research because the limitation on numbers of smartphones 
available during data collection and the requirement for each research 
subject to collect data for an entire week. 

• Increase the diversity of the research subjects. This research focused on 
data collection from NPS students. The standard course load of an NPS 
student is four classes a day, equating to four hours a day on campus 
unless the student remains on campus between classes or after hours. 
Increasing diversity of subject pool by including professors, teaching 
assistants, or administrative staff could increase the data points collected 
per day. 
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• Broaden the physieal parameters of the researeh. During this researeh, the 
data eolleetion was restricted to NFS campus. When student left campus 
for lunch, medical appointments, or end of the day, students paused the 
data collection until return to campus. Future work could collect data 
outside the NFS campus for better fidelity on a subject’s pattern of life 
throughout the day. 

• Combine this research with Lieutenant William Farker’s [21] “evaluation 
of data processing techniques for unobtrusive gait authentication” and 
Lieutenant Samuel Fleming’s [22] “identification of a smartphone user via 
keystroke analysis.” 

C. CLOSING REMARKS 

Is it possible to authenticate a smartphone user by continuous RSSI-based 
geolocation? With precision, recall, and f-scores above .7, it is feasible to use RSSI-based 
geolocation as an element in combination with other methods to continuously 
authenticate a smartphone user. For an acceptable authentication method, the evaluation 
criteria must be as close to 1.0 as possible. The research parameters in this research were 
very constrained, using NFS students as research subject and restricting the data 
collection to the NFS campus. A larger and broader data set for future work could 
increase the measure of performance to acceptable parameters. 

Can we use a HMM to model a user’s geolocation throughout the day? If yes, can 
we distinguish between various individuals? The result of the experiment shows a 
classification model which takes temporal states into consideration such as a HMM, 
could be used to model a user’s geolocation throughout the day. 
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APPENDIX. CONFUSION MATRICES 


CONFUSION MATRIX FOR INITIAL PARAMETERS 


Truth 

Inferred La 

bels 

111 

112 

122 

128 

141 

154 

175 

198 

372 

111 

24 

1 

0 

0 

0 

1 

0 

0 

6 

112 

1 

46 

2 

7 

0 

9 

0 

0 

0 

122 

0 

0 

18 

0 

0 

2 

3 

0 

0 

128 

2 

6 

7 

40 

22 

6 

0 

0 

0 

141 

0 

5 

0 

0 

65 

0 

0 

2 

0 

154 

11 

10 

2 

1 

1 

72 

3 

0 

0 

175 

0 

2 

4 

0 

0 

0 

36 

0 

0 

198 

1 

0 

0 

0 

0 

0 

0 

9 

0 

372 

2 

0 

0 

1 

0 

0 

0 

2 

64 


Table 13. Confusion Matrix for initial parameters run 2 


Truth 

Inferred La 

bels 

111 

112 

122 

128 

141 

154 

175 

198 

372 

111 

27 

0 

0 

0 

0 

1 

0 

1 

3 

112 

0 

50 

2 

8 

1 

2 

1 

0 

1 

122 

0 

0 

20 

0 

0 

0 

3 

0 

0 

128 

2 

5 

0 

76 

0 

0 

0 

0 

0 

141 

0 

8 

0 

6 

58 

0 

0 

0 

0 

154 

11 

13 

3 

6 

1 

64 

0 

0 

2 

175 

0 

0 

0 

0 

0 

0 

42 

0 

0 

198 

0 

0 

0 

0 

0 

0 

0 

10 

0 

372 

5 

0 

0 

0 

4 

0 

0 

1 

59 


Table 14. Confusion Matrix for initial parameters run 3 


Truth 

Inferred Labels 

111 

112 

122 

128 

141 

154 

175 

198 

372 

111 

20 

4 

0 

0 

0 

2 

0 

2 

4 

112 

0 

37 

2 

2 

0 

15 

8 

1 

0 

122 

0 

0 

16 

0 

0 

0 

2 

5 

0 

128 

0 

1 

0 

74 

0 

8 

0 

0 

0 

141 

0 

4 

0 

6 

60 

0 

0 

2 

0 

154 

0 

6 

1 

3 

0 

88 

2 

0 

0 

175 

0 

0 

17 

0 

0 

0 

25 

0 

0 

198 

0 

1 

0 

0 

0 

0 

0 

9 

0 

372 

0 

0 

0 

1 

0 

0 

0 

5 

63 


Table 15. Confusion Matrix for initial parameters run 4 
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Truth 

Inferred La 

bels 

111 

112 

122 

128 

141 

154 

175 

198 

372 

111 

26 

1 

0 

0 

0 

1 

0 

2 

2 

112 

4 

42 

0 

5 

0 

13 

0 

1 

0 

122 

0 

0 

18 

0 

0 

0 

5 

0 

0 

128 

1 

5 

0 

76 

0 

1 

0 

0 

0 

141 

1 

2 

0 

5 

64 

0 

0 

0 

0 

154 

0 

4 

1 

4 

0 

91 

0 

0 

0 

175 

0 

5 

12 

0 

0 

0 

25 

0 

0 

198 

0 

0 

0 

0 

0 

0 

0 

10 

0 

372 

13 

1 

0 

0 

1 

4 

0 

1 

49 


Table 16. Confusion Matrix for initial parameters run 5 


Truth 

Inferred Lai 

l)els 

111 

112 

122 

128 

141 

154 

175 

198 

372 

111 

24 

4 

0 

0 

0 

0 

0 

3 

1 

112 

0 

55 

0 

8 

1 

1 

0 

0 

0 

122 

0 

0 

19 

3 

0 

0 

1 

0 

0 

128 

0 

20 

0 

60 

3 

0 

0 

0 

0 

141 

0 

2 

0 

0 

68 

0 

0 

2 

0 

154 

0 

16 

0 

4 

1 

77 

2 

0 

0 

175 

0 

2 

17 

9 

0 

0 

14 

0 

0 

198 

0 

0 

0 

0 

0 

0 

0 

10 

0 

372 

2 

4 

0 

0 

0 

0 

0 

2 

61 


Table 17. Confusion Matrix for initial parameters run 6 


Truth 

Inferred Lai 

bels 

111 

112 

122 

128 

141 

154 

175 

198 

372 

111 

19 

1 

0 

3 

1 

3 

0 

0 

5 

112 

3 

52 

1 

4 

0 

1 

4 

0 

0 

122 

0 

0 

20 

0 

0 

0 

3 

0 

0 

128 

2 

3 

0 

78 

0 

0 

0 

0 

0 

141 

0 

4 

0 

6 

62 

0 

0 

0 

0 

154 

1 

5 

1 

8 

0 

85 

0 

0 

0 

175 

0 

0 

13 

0 

0 

0 

29 

0 

0 

198 

0 

0 

0 

0 

0 

0 

0 

10 

0 

372 

10 

1 

0 

0 

0 

0 

0 

3 

55 


Table 18. Confusion Matrix for initial parameters run 7 

30 






Truth 

Inferred Lai 

l)els 

111 

112 

122 

128 

141 

154 

175 

198 

372 

111 

26 

1 

0 

1 

0 

0 

0 

1 

3 

112 

1 

41 

4 

9 

0 

9 

0 

0 

1 

122 

0 

0 

16 

0 

0 

0 

7 

0 

0 

128 

22 

7 

0 

50 

0 

3 

0 

0 

1 

141 

6 

8 

0 

0 

56 

0 

0 

2 

0 

154 

0 

8 

1 

9 

1 

80 

0 

0 

1 

175 

0 

4 

0 

0 

0 

0 

38 

0 

0 

198 

0 

1 

0 

0 

0 

0 

0 

9 

0 

372 

7 

4 

0 

0 

0 

0 

0 

1 

57 


Table 19. Confusion Matrix for initial parameters run 8 


Truth 

Inferred La) 

l)els 

111 

112 

122 

128 

141 

154 

175 

198 

372 

111 

25 

2 

0 

0 

0 

0 

0 

0 

5 

112 

0 

52 

0 

7 

6 

0 

0 

0 

0 

122 

0 

0 

12 

0 

0 

0 

11 

0 

0 

128 

0 

18 

0 

54 

11 

0 

0 

0 

0 

141 

0 

0 

0 

1 

69 

0 

0 

2 

0 

154 

0 

10 

1 

10 

0 

79 

0 

0 

0 

175 

0 

4 

8 

3 

0 

0 

27 

0 

0 

198 

0 

1 

0 

0 

0 

0 

0 

9 

0 

372 

1 

0 

0 

1 

0 

0 

0 

1 

66 


Table 20. Confusion Matrix for initial parameters run 9 


Truth 

Inferred Lai 

bels 

111 

112 

122 

128 

141 

154 

175 

198 

372 

111 

24 

1 

0 

1 

0 

0 

0 

1 

5 

112 

5 

41 

3 

7 

0 

3 

4 

0 

2 

122 

0 

0 

22 

0 

0 

0 

1 

0 

0 

128 

2 

6 

2 

63 

0 

6 

0 

0 

4 

141 

1 

10 

0 

5 

53 

0 

0 

3 

0 

154 

0 

5 

3 

6 

0 

84 

0 

0 

2 

175 

0 

0 

14 

0 

0 

0 

28 

0 

0 

198 

0 

0 

0 

0 

0 

0 

0 

10 

0 

372 

9 

0 

0 

0 

0 

0 

0 

2 

58 


Table 21. Confusion Matrix for initial parameters run 10 
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B, CONFUSION MATRIX FOR BINARY 


Truth 

Inferred Lai 

bels 

111 

112 

122 

128 

141 

154 

175 

198 

372 

111 

20 

5 

0 

0 

0 

2 

0 

0 

5 

112 

1 

29 

4 

11 

4 

9 

5 

2 

0 

122 

0 

0 

23 

0 

0 

0 

0 

0 

0 

128 

0 

0 

0 

83 

0 

0 

0 

0 

0 

141 

0 

1 

0 

6 

64 

0 

0 

1 

0 

154 

0 

0 

2 

9 

0 

89 

0 

0 

0 

175 

0 

0 

11 

0 

0 

0 

31 

0 

0 

198 

0 

0 

0 

0 

0 

0 

0 

10 

0 

372 

6 

0 

0 

1 

0 

4 

0 

1 

57 


Table 22. Confusion Matrix for binary run 1 


Truth 

Inferred Lai 

bels 

111 

112 

122 

128 

141 

154 

175 

198 

372 

111 

25 

1 

0 

0 

0 

0 

0 

0 

6 

112 

4 

27 

0 

4 

0 

21 

9 

0 

0 

122 

0 

0 

23 

0 

0 

0 

0 

0 

0 

128 

3 

20 

0 

57 

3 

0 

0 

0 

0 

141 

0 

4 

0 

1 

67 

0 

0 

0 

0 

154 

0 

6 

1 

5 

0 

88 

0 

0 

0 

175 

0 

0 

2 

0 

0 

0 

40 

0 

0 

198 

0 

1 

0 

0 

0 

0 

0 

9 

0 

372 

1 

0 

0 

0 

0 

4 

0 

0 

64 


Table 23. Confusion Matrix for binary run 2 


Truth 

Inferred Lai 

bels 

111 

112 

122 

128 

141 

154 

175 

198 

372 

111 

21 

2 

0 

0 

0 

0 

0 

1 

8 

112 

0 

34 

3 

1 

8 

15 

4 

0 

0 

122 

0 

0 

10 

0 

0 

0 

13 

0 

0 

128 

0 

24 

8 

37 

14 

0 

0 

0 

0 

141 

0 

0 

0 

0 

70 

0 

0 

2 

0 

154 

0 

10 

5 

1 

10 

73 

1 

0 

0 

175 

0 

0 

0 

0 

0 

0 

42 

0 

0 

198 

0 

0 

0 

0 

0 

0 

0 

10 

0 

372 

0 

0 

0 

0 

0 

3 

0 

2 

64 


Table 24. Confusion Matrix for binary run 3 
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Table 25. Confusion Matrix for binary run 4 



Table 26. Confusion Matrix for binary run 5 



Table 27. Confusion Matrix for binary run 6 
33 



Table 28. Confusion Matrix for binary run 7 



Table 29. Confusion Matrix for binary run 8 



Table 30. Confusion Matrix for binary run 9 
34 





Table 39. Confusion Matrix for log 3 run 8 






CONFUSION MATRIX FOR LOG 7 


I 

I 

I 

I 

I 

I 

I 

I 

i 


ruth 

111 

112 

122 

128 

141 

154 

175 

198 

372 


Inferred Labels 


rruth 

111 

112 

111 

27 

1 

112 

0 

54 

122 

0 

2 

128 

0 

29 

141 

0 

4 

154 

0 

31 

175 

0 

8 

198 

0 

0 

372 

6 

5 


Table 52. Confusion Matrix for log 7 run 1 

_ Inferred Labels _ 

2 I 122 I 128 141 154 175 

0 0 0 0 _^ 

0 8 12 _ 

15 0 0 4 ^ 

I 0 39 10 4 ^ 

0 0 68 0 _ 

0 5 0 63 l_ 

3 0 0 0 M 

0 0 0 0 _ 

_ 0 I 0 I 0 I 1 I 0~ 

Table 53. Confusion Matrix for log 7 run 2 

Inferred Labels 


ruth 

111 

112 

111 

20 

1 

112 

0 

48 

122 

0 

0 

128 

0 

6 

141 

0 

9 

154 

0 

14 

175 

0 

8 

198 

0 

1 

372 

0 

1 


Table 54. Confusion Matrix for log 7 run 3 




Truth 

Inferred Lai 

bels 

111 

112 

122 

128 

141 

154 

175 

198 

372 

111 

21 

0 

0 

0 

6 

1 

0 

0 

2 

112 

1 

43 

2 

1 

8 

8 

0 

2 

0 

122 

0 

0 

21 

0 

0 

0 

3 

0 

0 

128 

4 

5 

7 

50 

11 

5 

0 

0 

0 

141 

0 

0 

0 

0 

69 

0 

0 

3 

0 

154 

1 

2 

3 

0 

1 

93 

0 

0 

0 

175 

0 

3 

3 

0 

0 

0 

36 

0 

0 

198 

0 

0 

0 

0 

0 

0 

0 

10 

0 

372 

7 

0 

0 

0 

0 

0 

0 

2 

61 


Table 61. Confusion Matrix for log 7 run 10 


F. CONFUSION MATRIX FOR WINDOW SIZE 10 


Truth 

Inferred Lai 

bels 

111 

112 

122 

128 

141 

154 

175 

198 

372 

111 

61 

1 

0 

3 

20 

0 

0 

1 

7 

112 

2 

118 

15 

15 

0 

14 

12 

0 

0 

122 

0 

0 

61 

0 

0 

0 

11 

0 

0 

128 

1 

6 

11 

166 

28 

8 

0 

0 

0 

141 

0 

12 

1 

0 

173 

0 

0 

6 

0 

154 

1 

32 

7 

13 

16 

174 

22 

0 

0 

175 

0 

1 

20 

0 

0 

0 

97 

0 

0 

198 

0 

3 

0 

0 

0 

0 

0 

36 

0 

372 

15 

3 

0 

2 

2 

0 

0 

4 

161 


Table 62. Confusion Matrix for window size 10 run 1 


Truth 

Inferred Lai 

bels 

111 

112 

122 

128 

141 

154 

175 

198 

372 

111 

48 

5 

1 

2 

8 

5 

0 

4 

20 

112 

6 

74 

24 

32 

0 

9 

19 

8 

4 

122 

0 

0 

36 

0 

0 

0 

36 

0 

0 

128 

0 

3 

11 

195 

3 

0 

0 

0 

8 

141 

2 

23 

0 

12 

147 

0 

0 

8 

0 

154 

1 

13 

16 

21 

0 

209 

0 

0 

5 

175 

0 

4 

0 

0 

0 

11 

103 

0 

0 

198 

1 

0 

0 

0 

0 

0 

0 

38 

0 

372 

0 

12 

0 

0 

0 

0 

1 

6 

168 


Table 63. Confusion Matrix for window size 10 run 2 


45 








Truth 

Inferred Lai 

bels 

111 

112 

122 

128 

141 

154 

175 

198 

372 

111 

27 

2 

0 

0 

0 

0 

0 

1 

2 

112 

0 

51 

2 

1 

6 

2 

3 

0 

0 

122 

0 

1 

19 

0 

0 

0 

3 

0 

0 

128 

0 

17 

1 

54 

11 

0 

0 

0 

0 

141 

0 

0 

0 

0 

70 

0 

0 

2 

0 

154 

0 

24 

1 

0 

1 

74 

0 

0 

0 

175 

0 

1 

0 

0 

0 

4 

37 

0 

0 

198 

0 

1 

0 

0 

0 

0 

0 

9 

0 

372 

7 

0 

0 

0 

0 

0 

0 

2 

60 


Table 70. Confusion Matrix for window size 10 run 9 


Truth 

Inferred Lai 

bels 

111 

112 

122 

128 

141 

154 

175 

198 

372 

111 

18 

2 

0 

1 

0 

0 

0 

2 

9 

112 

0 

45 

3 

9 

6 

2 

0 

0 

0 

122 

0 

1 

16 

0 

0 

0 

6 

0 

0 

128 

2 

3 

0 

60 

14 

4 

0 

0 

0 

141 

0 

0 

0 

0 

72 

0 

0 

0 

0 

154 

1 

15 

1 

10 

1 

70 

1 

1 

0 

175 

0 

5 

12 

0 

0 

0 

25 

0 

0 

198 

0 

1 

0 

0 

0 

0 

0 

9 

0 

372 

0 

0 

0 

0 

4 

1 

0 

2 

62 


Table 71. Confusion Matrix for window size 10 run 10 


G. CONFUSION MATRIX FOR WINDOW SIZE 15 


Truth 

Inferred Lai 

bels 

111 

112 

122 

128 

141 

154 

175 

198 

372 

111 

38 

1 

0 

2 

0 

0 

0 

4 

14 

112 

2 

57 

6 

14 

12 

0 

14 

10 

0 

122 

0 

0 

21 

0 

0 

0 

14 

9 

0 

128 

0 

1 

14 

98 

27 

0 

0 

4 

0 

141 

0 

0 

0 

1 

122 

0 

0 

2 

0 

154 

1 

3 

8 

18 

1 

134 

0 

9 

0 

175 

0 

0 

16 

0 

0 

0 

61 

0 

0 

198 

0 

0 

0 

0 

0 

0 

0 

23 

0 

372 

3 

0 

0 

0 

0 

0 

0 

15 

103 


Table 72. Confusion Matrix for window size 15 run 1 


48 








% 



Table 81. Confusion Matrix for window size 15 run 10 


H. CONFUSION MATRIX FOR WINDOW SIZE 20 


Truth 

Inferred La) 

Ijels 

111 

112 

122 

128 

141 

154 

175 

198 

372 

111 

28 

0 

0 

0 

7 

0 

0 

1 

6 

112 

4 

43 

5 

9 

8 

4 

7 

2 

2 

122 

0 

0 

21 

0 

0 

0 

10 

0 

0 

128 

0 

12 

1 

71 

21 

1 

0 

0 

0 

141 

0 

0 

0 

0 

87 

0 

0 

4 

0 

154 

1 

7 

3 

6 

0 

106 

2 

1 

2 

175 

0 

0 

5 

0 

0 

0 

49 

1 

0 

198 

0 

0 

0 

0 

0 

0 

0 

15 

0 

372 

0 

1 

0 

0 

0 

0 

0 

0 

88 


Table 82. Confusion Matrix for window size 20 run 1 


Truth 

Inferred Lai 

bels 

111 

112 

122 

128 

141 

154 

175 

198 

372 

111 

28 

0 

0 

0 

6 

0 

0 

2 

6 

112 

0 

42 

1 

5 

6 

27 

0 

2 

1 

122 

0 

0 

19 

0 

0 

5 

7 

0 

0 

128 

3 

4 

1 

52 

28 

17 

1 

0 

0 

141 

0 

1 

0 

0 

90 

0 

0 

0 

0 

154 

1 

8 

2 

1 

1 

114 

0 

0 

1 

175 

0 

4 

7 

0 

0 

0 

44 

0 

0 

198 

0 

0 

0 

0 

0 

0 

0 

15 

0 

372 

3 

1 

0 

0 

0 

0 

0 

3 

82 


Table 83. Confusion Matrix for window size 20 run 2 


Truth 

Inferred Lai 

bels 

111 

112 

122 

128 

141 

154 

175 

198 

372 

111 

25 

1 

0 

0 

6 

1 

0 

2 

7 

112 

4 

49 

11 

11 

1 

2 

0 

6 

0 

122 

0 

0 

20 

0 

0 

0 

11 

0 

0 

128 

0 

7 

6 

56 

27 

10 

0 

0 

0 

141 

0 

13 

0 

0 

75 

0 

0 

3 

0 

154 

7 

40 

2 

3 

1 

72 

1 

2 

0 

175 

0 

4 

3 

0 

0 

6 

42 

0 

0 

198 

0 

0 

0 

0 

0 

0 

0 

15 

0 

372 

0 

1 

0 

0 

0 

2 

0 

3 

83 


Table 84. Confusion Matrix for window size 20 run 3 


52 







Table 87. Confusion Matrix for window size 20 run 6 



Truth 

Inferred Lai 

bels 

111 

112 

122 

128 

141 

154 

175 

198 

372 

111 

26 

0 

0 

1 

0 

1 

0 

0 

4 

112 

5 

31 

0 

13 

7 

6 

0 

1 

2 

122 

0 

0 

22 

0 

0 

0 

1 

0 

0 

128 

0 

12 

0 

71 

0 

0 

0 

0 

0 

141 

1 

0 

0 

5 

64 

0 

0 

2 

0 

154 

0 

14 

0 

7 

0 

77 

0 

0 

2 

175 

0 

4 

10 

0 

0 

0 

28 

0 

0 

198 

0 

0 

0 

0 

0 

0 

0 

10 

0 

372 

2 

0 

0 

0 

0 

0 

0 

1 

66 


Table 91. Confusion Matrix for window size 20 run 10 


I. CONFUSION MATRIX FOR 50% TRAINING, 50% TEST 


Truth 

Inferred Lai 

l)els 

111 

112 

122 

128 

141 

154 

175 

198 

372 

111 

16 

5 

0 

0 

3 

1 

0 

1 

9 

112 

0 

42 

10 

0 

4 

4 

8 

0 

0 

122 

0 

0 

17 

0 

0 

0 

9 

0 

0 

128 

0 

45 

8 

28 

1 

4 

0 

0 

0 

141 

0 

1 

0 

2 

70 

0 

0 

2 

0 

154 

0 

8 

5 

0 

1 

89 

0 

0 

0 

175 

0 

0 

3 

0 

0 

0 

42 

0 

0 

198 

0 

1 

0 

0 

0 

0 

0 

12 

0 

372 

0 

0 

0 

0 

0 

5 

0 

2 

65 


Table 92. Confusion Matrix for 50% training, 50% test run 1 


Truth 

Inferred Lai 

bels 

111 

112 

122 

128 

141 

154 

175 

198 

372 

111 

29 

0 

0 

0 

0 

2 

0 

1 

3 

112 

2 

32 

1 

14 

2 

8 

9 

0 

0 

122 

0 

0 

10 

9 

0 

0 

7 

0 

0 

128 

5 

2 

0 

77 

0 

2 

0 

0 

0 

141 

4 

10 

0 

2 

57 

0 

0 

2 

0 

154 

0 

7 

1 

8 

1 

85 

1 

0 

0 

175 

0 

0 

11 

0 

0 

0 

34 

0 

0 

198 

0 

0 

0 

0 

0 

0 

0 

13 

0 

372 

3 

0 

0 

0 

1 

0 

0 

0 

68 


Table 93. Confusion Matrix for 50% training, 50% test run 2 

55 






Inferred Labels 


ruth 

111 

112 

111 

24 

1 

112 

0 

57 

122 

0 

0 

128 

0 

25 

141 

0 

10 

154 

0 

29 

175 

0 

7 

198 

0 

1 

372 

5 

5 


Table 94. Confusion Matrix for 50% training, 50% test run 3 


Inferred Labels 


ruth 

111 

112 

111 

24 

0 

112 

4 

45 

122 

0 

0 

128 

1 

8 

141 

0 

10 

154 

42 

3 

175 

0 

0 

198 

1 

0 

372 

4 
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Table 95. Confusion Matrix for 50% training, 50% test run 4 


Inferred Labels 


ruth 

111 

112 

111 

27 

1 

112 

9 

29 

122 

0 

0 

128 

1 

47 

141 

0 

0 

154 

1 

12 

175 

0 

2 

198 

0 

0 

372 

1 

0 
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